{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "01829769",
   "metadata": {},
   "source": [
    "## 8.4 GoogLeNet"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "faa19aaa",
   "metadata": {},
   "source": [
    "论文可见于：[论文链接](https://arxiv.org/abs/1409.4842)\n",
    "\n",
    "GoogLeNet是在2014年ImageNet竞赛中首次提出的，并在竞赛中获得了最高的分类精度，GoogLeNet和之前介绍过的VGGNet分列比赛的第一名和第二名，而两个模型之间的差距也在毫厘之间。其网络设计受到了LeNet和AlexNet的启发，但在模型架构方面有了很大的改进。\n",
    "\n",
    "GoogLeNet是谷歌研究出来的深度网络结构，为什么不叫“GoogleNet”，而叫“GoogLeNet”，据说是为了向“LeNet”致敬，因此取名为“GoogLeNet”。截止目前为止，论文引用次数已经超过4万。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3a758a6",
   "metadata": {},
   "source": [
    "### 8.4.1 GoogLeNet基本思想"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61b129ec",
   "metadata": {},
   "source": [
    "GoogLeNet是2014年Christian Szegedy等人提出的一种全新的深度学习结构，在这之前的AlexNet、VGGNet等结构都是通过增大网络的深度来获得更好的训练效果，但层数的增加会带来很多负作用，比如过拟合、梯度消失、梯度爆炸等。GoogLeNet中提出的Inception结构则从另一种角度来提升训练结果：能更高效的利用计算资源，在相同的计算量下同时增加网络的宽度与深度，提取更多的特征，从而提升训练结果。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe4347db",
   "metadata": {},
   "source": [
    "### 8.4.2 Inception结构"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aeeda49d",
   "metadata": {},
   "source": [
    "在讲GoogLeNet之前，先看一下它的重要组成部分Inception的结构。如下图所示：\n",
    "\n",
    "<img src=\"./images/8-3-1.png\" width=\"50%\"></img>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca368f3f",
   "metadata": {},
   "source": [
    "典型深度学习图像分类网络存在收敛速度太慢，训练参数太多、训练时间长，容易发生梯度消失和梯度爆炸问题。为了解决上述问题，GoogLeNet模型基于Inception网络构建而成，融合了不同尺度的特征信息，是一种带有稀疏性和具备高性能的网络结构。它将1x1，3x3，5x5的卷积层和3x3的最大池化层的结果堆叠起来，在3x3，5x5的卷积层之前以及3x3最大池化层之后加上了1x1的卷积层降维。Inception网络增加了网络的宽度，融合了不同小尺度的卷积与池化操作，能够有效地捕捉图像中的不同尺度特征，可以达到更好的识别效果，同时还能够有效地减少模型的参数数量。这使得Inception结构在许多不同的图像处理任务中都表现出色。\n",
    "\n",
    "其中1x1卷积核是一种特殊的卷积核，通常用于改变输入张量的维度。在Inception模块中使用1x1卷积核的目的是减少输入张量的通道数，从而减少模型的参数数量。例如，如果输入张量是一个256x256x128的张量，而1x1卷积核的个数是64，那么使用1x1卷积核后输出的张量就是一个256x256x64的张量。可以看到，使用1x1卷积核后输出张量的通道数减少了一半，这也就意味着模型的参数数量也减少了一半。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d68c398f",
   "metadata": {},
   "source": [
    "### 8.4.3 GoogLeNet结构"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43768712",
   "metadata": {},
   "source": [
    "GoogLeNet的整体结构如下图：\n",
    "\n",
    "<img src=\"./images/8-3-2.png\" width=\"100%\"></img>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8d76d6d",
   "metadata": {},
   "source": [
    "很多同学一看到这个图就晕了，不用着急，其实里面很多元素都是相同的Inception结构，下面梳理一下GoogLeNet的网络结构。\n",
    "\n",
    "* 第一部分是卷积层，包含64个7x7步长为2的卷积核，后接最大池化层。\n",
    "* 第二部分是卷积层，包含64个1x1的卷积核，然后是192个3x3步长为1的卷积核，后接最大池化层。\n",
    "* 第三部分是两个Inception层，后接最大池化层。\n",
    "* 第四部分是五个Inception层，后接最大池化层。\n",
    "* 第五部分是两个Inception层，后接平均池化层。\n",
    "* 第六部分是是输出层，包括1000个神经元的全连接，用于预测图像属于哪一类。\n",
    "\n",
    "这样一梳理其实只有六个部分，是不是就清晰很多了。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae23f513",
   "metadata": {},
   "source": [
    "其中为了避免梯度消失，网络额外增加了2个辅助的softmax用于向前传导梯度，也可以理解为辅助分类器。辅助分类器是将中间某一层的输出用作分类，并按一个较小的权重加到最终分类结果中，这样相当于做了模型融合，同时给网络增加了反向传播的梯度信号，也提供了额外的正则化，对于整个网络的训练很有裨益。而在实际测试的时候，这两个额外的softmax会被去掉，同学们了解即可，后面代码实现也会暂时省略这个部分。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f015bc1f",
   "metadata": {},
   "source": [
    "GoogLeNet的具体结构及参数量计算参考下图：\n",
    "\n",
    "<img src=\"./images/8-3-3.png\" width=\"80%\"></img>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec0a3905",
   "metadata": {},
   "source": [
    "### 8.4.4 GoogLeNet代码实现"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6380128a",
   "metadata": {},
   "source": [
    "接下来看一下如何用代码实现相关网络结构，为便于理解略作简化。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0f6d4f48",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入必要的库\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2d29480",
   "metadata": {},
   "source": [
    "首先定义Inception结构。其中branch就是Inception中的四条路径，经ReLU激活函数后连结组成输出。其中：\n",
    "\n",
    "* in_channels: 表示上一层输入的通道数；\n",
    "* ch1x1: 表示 1x1卷积的个数；\n",
    "* ch3x3red: 表示3x3卷积之前1x1卷积的个数；\n",
    "* ch3x3: 表示3x3卷积的个数；\n",
    "* ch5x5red: 表示5x5卷积之前1x1卷积的个数\n",
    "* ch5x5: 表示5x5卷积的个数；\n",
    "* pool_proj: 表示池化后1x1卷积的个数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "fb415deb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 定义Inception结构\n",
    "class Inception(nn.Module):\n",
    "\n",
    "    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):\n",
    "        super().__init__()\n",
    "\n",
    "        # 定义四个分支路径\n",
    "        self.branch1 = nn.Conv2d(in_channels, ch1x1, kernel_size=1)\n",
    "        self.branch2 = nn.Sequential(\n",
    "            nn.Conv2d(in_channels, ch3x3red, kernel_size=1),\n",
    "            nn.Conv2d(ch3x3red, ch3x3, kernel_size=3, padding=1)\n",
    "        )\n",
    "        self.branch3 = nn.Sequential(\n",
    "            nn.Conv2d(in_channels, ch5x5red, kernel_size=1),\n",
    "            nn.Conv2d(ch5x5red, ch5x5, kernel_size=3, padding=1)\n",
    "        )\n",
    "        self.branch4 = nn.Sequential(\n",
    "            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),\n",
    "            nn.Conv2d(in_channels, pool_proj, kernel_size=1)\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        branch1 = F.relu(self.branch1(x), inplace=True)\n",
    "        branch2 = F.relu(self.branch2(x), inplace=True)\n",
    "        branch3 = F.relu(self.branch3(x), inplace=True)\n",
    "        branch4 = F.relu(self.branch4(x), inplace=True)\n",
    "\n",
    "        # 连结输出\n",
    "        outputs = [branch1, branch2, branch3, branch4]\n",
    "        return torch.cat(outputs, dim=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60da60ee",
   "metadata": {},
   "source": [
    "接下来定义GoogLeNet，每个part分别对应前面梳理的网络结构，总共六个部分。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d35ec9ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 定义GoogLeNet的网络结构\n",
    "class GoogLeNet(nn.Module):\n",
    "\n",
    "    def __init__(self, num_classes=1000):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.part1 = nn.Sequential(\n",
    "            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.MaxPool2d(3, stride=2, padding=1)\n",
    "        )\n",
    "        self.part2 = nn.Sequential(\n",
    "            nn.Conv2d(64, 64, kernel_size=1),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Conv2d(64, 192, kernel_size=3, padding=1),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.MaxPool2d(3, stride=2, padding=1)\n",
    "        )\n",
    "        self.part3 = nn.Sequential(\n",
    "            Inception(192, 64, 96, 128, 16, 32, 32),\n",
    "            Inception(256, 128, 128, 192, 32, 96, 64),\n",
    "            nn.MaxPool2d(3, stride=2, padding=1)\n",
    "        )\n",
    "        self.part4 = nn.Sequential(\n",
    "            Inception(480, 192, 96, 208, 16, 48, 64),\n",
    "            Inception(512, 160, 112, 224, 24, 64, 64),\n",
    "            Inception(512, 128, 128, 256, 24, 64, 64),\n",
    "            Inception(512, 112, 144, 288, 32, 64, 64),\n",
    "            Inception(528, 256, 160, 320, 32, 128, 128),\n",
    "            nn.MaxPool2d(3, stride=2, padding=1)\n",
    "        )\n",
    "        self.part5 = nn.Sequential(\n",
    "            Inception(832, 256, 160, 320, 32, 128, 128),\n",
    "            Inception(832, 384, 192, 384, 48, 128, 128),\n",
    "            nn.AdaptiveAvgPool2d((1, 1))\n",
    "        )\n",
    "        self.part6 = nn.Sequential(\n",
    "            nn.Flatten(),\n",
    "            nn.Dropout(0.4),\n",
    "            nn.Linear(1024, num_classes)\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = self.part1(x)\n",
    "        x = self.part2(x)\n",
    "        x = self.part3(x)\n",
    "        x = self.part4(x)\n",
    "        x = self.part5(x)\n",
    "        x = self.part6(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0583c18d",
   "metadata": {},
   "source": [
    "### 8.4.5 小结"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76a129bf",
   "metadata": {},
   "source": [
    "* GoogLeNet在2014年的ImageNet竞赛中获得了最高的分类精度，证明了它在图像分类任务中的优越性。\n",
    "* 使用了Inception模块，能够有效地捕捉图像中的不同尺度特征，而且还能够有效地减少模型的参数数量。\n",
    "* 在Inception模块中使用1x1的卷积核，目的是减少输入张量的通道数，从而减少模型的参数数量。\n",
    "* 训练时使用了两个辅助分类器，作用是增加反向传播的梯度信号，也提供了额外的正则化，帮助主分类器更好地泛化。\n",
    "* 论文的主要思路也是想通过构建密集的块结构来近似最优的稀疏结构，从而达到提高性能而又不大量增加计算量的目的。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
